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Getting Started "28 


oe 
+ 


Azure Database MySQL SQL Database Cosmos DB Synapse Analytics 


Azure has 200+ services. Exam expects you to understand 40+ services. 


e Exam tests your decision making abilities: 
= Which data format will you use in which situation? 


= Which Azure data store will you use in which situation? 
e This course is designed to help you make these choices 
Our Goal : Help you get certified and start your cloud journey with Azure 


How do you put your best foot forward? 


e Challenging certification - Expects 
you to understand and REMEMBER a 
number of services 


e Astime passes, humans forget things. 
e How do you improve your chances of 
remembering things? 
= Active learning - think and take notes 


= Review the presentation every once in a 
while 


100% 
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Our Approach “28 
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e Three-pronged approach to reinforce 
concepts: 
= Presentations (Video) 


= Demos (Video) 


= Two kinds of quizzes: 
o Text quizzes 


o Video quizzes 


e (Recommended) Take your time. Do 
not hesitate to replay videos! 


e (Recommended) Have Fun! 


inutes 


Getting Started - Azure 


Before the Cloud - Example 1 - Online Shopping App "28 
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e Challenge: 


= Peak usage during holidays and weekends 
= Less load during rest of the time 


e Solution (before the Cloud): 


= Procure (Buy) infrastructure for peak load 
o What would the infrastructure be doing during periods of low loads? 


Before the Cloud - Example 2 - Startup 


12346 78 gpm nH 0:002%345678901“4L:%!02%3a456J789:0 


e Challenge: 


= |t suddenly becomes popular. 
= How to handle the sudden increase in load? 


e Solution (before the Cloud): 


= Procure (Buy) infrastructure assuming they would be successful 
o What if they are not successful? 
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Before the Cloud - Challenges "28 
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12346 78 gpm nRr12345 678 GM HR12345678 9§ 0 


e High cost of procuring infrastructure 

Needs ahead of time planning (Can you guess the future?) 

e Low infrastructure utilization (PEAK LOAD provisioning) 

Dedicated infrastructure maintenance team (Can a startup afford it?) 


Silver Lining in the Cloud 128 


e How about provisioning (renting) resources when you want 


them and releasing them back when you do not need them? 
= On-demand resource provisioning 
= Also called Elasticity 


Cloud - Advantages 


e Trade "capital expense" for "variable expense" 

e Benefit from massive economies of scale 

e Stop guessing capacity 

e Stop spending money running and maintaining data centers 
e "Go global" in minutes 


Microsoft Azure 


One of the leading cloud service providers 
Provides 200+ services 
Reliable, secure and cost-effective 


The entire course is all about Azure. You will learn it as we go 
further. 


Best path to learn Azure! 28 


Advisor Machine Learning Cosmos DB Azure DevOps 


e Cloud applications make use of multiple Azure services. 
e There is no single path to learn these services independently. 
e HOWEVER, we've worked out a simple path! 


Setting up Azure Account 


e Create Azure Account 
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Regions and Zones 


Regions and Zones 28 
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London Region 


Application 


Corporate Data Center 


e Imagine that your application is deployed in a data center in London 
e What would be the challenges? 


= Challenge 1: Slow access for users from other parts of the world (high latency) 


= Challenge 2 : What if the data center crashes? 
o Your application goes down (low availability) 


Multiple data centers 128 
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Application Application 


Corporate Data Center 1 Corporate Data Center 2 


e Let's add in one more data center in London 
e What would be the challenges? 


= Challenge 1: Slow access for users from other parts of the world 
= Challenge 2 (SOLVED) : What if one data center crashes? 


o Your application is still available from the other data center 


= Challenge 3 : What if entire region of London is unavailable? 
o Your application goes down 


Multiple regions 


Application 


Application Application 


Corporate Data Center 1 Corporate Data Center 2 Corporate Data Center 1 Corporate Data Center 2 


e Let's add a new region : Mumbai 


e What would be the challenges? 
= Challenge 1 (PARTLY SOLVED): Slow access for users from other parts of the world 


o You can solve this by adding deployments for your applications in other regions 


= Challenge 2 (SOLVED) : What if one data center crashes? 


o Your application is still live from the other data centers 


= Challenge 3 (SOLVED) : What if entire region of London is unavailable? 


o Your application is served from Mumbai 
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Regions 


e Imagine setting up data centers in 
different regions around the world 
= Would that be easy? 
e (Solution) Azure provides 60+ regions 
around the world 
= Expanding every year 
e Region : Specific geographical 
location to host your resources 
e Advantages: 
= High Availability 
= Low Latency 
= Global Footprint 
= Adhere to government regulations 
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Availability Zones 


e How to achieve high availability in the same 
region (or geographic location)? 
= Enter Availability Zones 
o Multiple AZs (3) in a region 


o One or more discrete data centers 


o Each AZ has independent € redundant power, networking & 
connectivity 


o AZs in a region are connected through low-latency links 


e (Advantage) Increased availability and fault 


tolerance within same region 
= Survive the failure of a complete data center 


e (Remember) NOT all Azure regions have 
Availability Zones 
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Regions and Availability Zones examples 


New Regions and AZs are constantly added 


Region Availability Zones 
E ETT 
E Mo ae 
D Mi er a 
de ETT 
E Mi a al 


Data 


Data is the "oil of the 21st Century Digital Economy" 


Amount of data generated increasing exponentially 
= Mobile devices, IOT devices, application metrics etc 
= Variety of 
o Data formats: Structured, Semi Structured and Unstructured 
o Data store options: Relational databases, NoSQL databases, Analytical 
databases, Object/Block/File storage ... 
Store data efficiently and gain intelligence 


e Goal of the course: Help you choose specific data 


format and Azure data store for your use case 
= We will start with 10,000 feet overview of cloud: 
o Regions, Zones and laaS/PaaS/SaaS 


= After that, play with different data formats and data storage 
options in Azure 


1.25E+9 


1.00E+9 


7.50E+8 


5.00E+8 
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laaS vs PaaS vs SaaS 


Azure Virtual Machines 


28 
e In corporate data centers, data stores are deployed on 
physical servers 
e Where do you deploy data stores in the cloud? 
= Rent virtual servers 


= Virtual Machines - Virtual servers in Azure 


= Azure Virtual Machines - Provision €: Manage Virtual Machines 


VM 


by 


Problem with using VMs for Databases 


e You need to take care of: 
= OS installation & upgrades 


= Database installation & upgrades 

= Availability (create a standby database) 
= Durability (take regular backups) 

= Scaling compute & storage 
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Managed Services 128 


e Do you want to continue running databases in the cloud, the 
same way you run them in your data center? 
OR are there OTHER approaches? CO 
e Let's understand some terminology used with cloud services: 
= laaS (Infrastructure as a Service) 


= PaaS (Platform as a Service) 
= SaaS (Software as a Service) 


e Let's get on a quick journey to understand these! 


laaS (Infrastructure as a Service) 


Use only infrastructure from cloud provider 
Example: Running SQL Server on a VM 


e Cloud Provider is responsible for: 
= Virtualization, Hardware and Networking 


e You are responsible for: 
= OS upgrades and patches 
= Database software and upgrades 
= Database Configuration (Tables, Indexes, Views etc) 
= Data 
= Scaling of compute & storage, Availability and Durability 
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Database Configuration 


Database Software 


Virtualization 


Physical Hardware 


Networking 


= 
E 
= 
e 
EA 


dd 


PaaS (Platform as a Service) 


Use a platform provided by cloud 
Cloud provider is responsible for: 
= Virtualization, Hardware and Networking 
= OS upgrades and patches 
= Database software and upgrades 
= Scaling, Availability, Durability etc.. 
e You are responsible for: 
= Database Configuration (Tables, Views, Indexes, ...) 
= Data 


e Examples: Azure SQL Database, Azure Cosmos DB and 
a lot more ... 


You will NOT have access to OS and Database software 
(most of the times!) 


Database Configuration 


Database Software 


Virtualization 


Physical Hardware 


Networking 


SaaS (Software as a Service) 128 
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e Centrally hosted software (mostly on the cloud) a 


= Offered on a subscription basis (pay-as-you-go) 


= Examples: 
o Email, calendaring & office tools (such as Outlook 365, Microsoft Office 365, Gmail, Google Docs) 


o Customer relationship management (CRM), enterprise resource planning (ERP) and document 
management tools 


e Cloud provider is responsible for: 
= OS (incl. upgrades and patches) 
= Application Runtime 
= Auto scaling, Availability & Load balancing etc.. 
= Application code and/or 
= Application Configuration (How much memory? How many instances? ..) 


e Customer is responsible for: 
= Configuring the software! 


Azure Cloud Service Categories - Scenarios 128 
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Scenario Solution 
= DU a 
z E ee 
= VD TT 
= Lace oe cc RE 
E a. a 
= Coca comi Dn TR” 
= o TOR RT 
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Data Formats & Data Stores 
10,000 Feet Overview 


Data Formats & Data Stores 


Data is the "oil of the 21st Century Digital Economy" 
Amount of data generated increasing exponentially 
e Data formats: 

= Structured: Tables, Rows and Columns (Relational) 

= Semi Structured: Key-Value, Document (JSON), Graph, etc 

= Unstructured: Video, Audio, Image, Text files, Binary files ... 
Data stores: 

= Relational databases 

= NoSQL databases 

= Analytical databases 

= Object/Block/File storage 


1.25E+9 
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Structured Data - Relational Databases 128 


e Data stored in Tables - Rows & 
Columns 


e Predefined schema  - Tables, 
Relationships and Constraints 


e Define indexes - Query efficiently on 
all columns 


e Used for 
= OLTP (Online Transaction Processing) use 
cases and 
= OLAP (Online Analytics Processing) use 
cases 


Minutes 


1D | DepartmentName | Name Duration | InstructorlD 


Computer Science | Algorithms 2 


2 4 
E Computer Science | Operating Systems 4 
El Computer Science | Database Management Systems 2 


1] Name : varchar(255) 
Location : varchar(255) 


headedBy : varchar(255) EO 
FirstName : varchar(255) ' 
LastName : varchar(255) . 


Phone : varchar(255) 


po... 


OO 
(Course 
ID : int 
DepartmentName : varchar(255) 
InstructorlD : int 
Duration : int 
Name : varchar(255) 


ID : int 
FirstName : varchar(255) 
LastName : varchar(255) 
Phone : varchar(255) 


Relational Database - OLTP (Online Transaction Processing) 


e Applications where large number of users make large 
number (millions) of transactions — 
= Transaction - small, discrete, unit of work 


o Example: Transfer money from your account to your friend's account 
= Heavy writes and moderate reads 


= Quick processing expected SQL Database 
e Use cases: Most traditional applications - banking, e- > 
commerce, .. 
e Popular databases: MySQL, Oracle, SQL Server etc 
e Some Azure Managed Services: 
= Azure SQL Database: Managed Microsoft SQL Server Azure Database 
= Azure Database for MySQL: Managed MySQL PostgreSQL 


= Azure Database for PostgreSQL: Managed PostgreSQL 


Relational Database - OLAP (Online Analytics Processing):28 
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e Applications allowing users to analyze petabytes of data 
= Examples: Reporting applications, Data warehouses, Business intelligence 
applications, Analytics systems 


= Data is consolidated from multiple (typically transactional) databases 
= Sample application : Decide insurance premiums analyzing data from last 
hundred years Synapse Analytics 
e Azure Managed Service: Azure Synapse Analytics 
= Petabyte-scale distributed data ware house 


= Unified experience for developing end-to-end analytics solutions 
o Data integration + Data warehouse + Data analytics 


= Run complex queries across petabytes of data 
= Earlier called Azure SQL Data Warehouse 


Relational Databases - OLAP vs OLTP 28 
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OLAP and OLTP use similar data structures 


BUT very different approach in how data is 
stored 
e OLTP databases use row storage 
= Each table row is stored together 
= Efficient for processing small transactions 
OLAP databases use columnar storage 
= Each table column is stored together 
= High compression - store petabytes of data efficiently progra 
= Distribute data - one table in multiple cluster nodes 


= Execute single query across multiple nodes - 
Complex queries can be executed efficiently 


11M 


Columnar Storage 


Cluster Columnar Storage 


1:28 
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Semi Structured Data 


Data has some structure BUT not very strict 
Semi Structured Data is stored in NoSQL databases 
= NoSQL = not only SQL 


= Flexible schema 
o Structure data the way your application needs it 


o Let the structure evolve with time 

= Horizontally scale to petabytes of data with millions of TPS 
Managed Service: Azure Cosmos DB 
Types of Semi Structured Data: 

= Document 

= Key Value 

= Graph 

= Column Family 


} 


n28 
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"customerId": "99999999", 
"firstName": "Ranga", 
"LastName": "Ranga", 
"address": { 


] 


"number": "505", 
"street": "Main Street", 
"city": "Hyderabad", 
"state": "Telangana" 


socialProfiles": [ 
"name": "twitter", 


"username": "@in28minutes" 


"name": "linkedin", 
"username": "rangaraokaranam" 


} 


TI 


GO 
= 


Semi Structured Data - 1 - Document 


e Data stored as collection of documents 
= Typically JSON (Javascript Object Notation) 


o Be careful with formatting (name/value pairs, commas etc) 
o address - Child Object - {} 
o socialProfiles - Array - [ ] 


= Documents are retrieved by unique id (called the key) 
o Typically, you can define additional indexes 


= Documents don't need to have the same structure 
o Nostrict schema defined on database 


o Apps should handle variations (application defined schema) 


= Typically, information in one document would be stored in 
multiple tables, if you were using a relational database 


e Use cases: Product Catalog, Profile, Shopping Cart etc 


e Managed Service: Azure Cosmos DB SQL API & 
MongoDB API 


} 


n28 
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"customerId": "99999999", 
"firstName": "Ranga", 
"LastName": "Ranga", 
"address": { 


] 


"number": "505", 
"street": "Main Street", 
"city": "Hyderabad", 
"state": "Telangana" 


socialProfiles": [ 
"name": "twitter", 


"username": "@in28minutes" 


"name": "linkedin", 
"username": "rangaraokaranam" 


} 


Semi Structured Data - 2 - Key-Value "28 
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emaite 

user1 { "name": “Jane”, "previousAction" : "someAction1” } 

ES user2 { "name": "Doe", "previousAction” : “"someAction2" } 
Key Value Database user3 { "name": "Doe", "previousAction” : “someAction3" } 


e Similar to a HashMap 
= Key - Unique identifier to retrieve a specific value 
= Value - Number or a String or a complex object, like a JSON file 


= Supports simple lookups - query by keys 
o NOT optimized for query by values 
o Typically, no other indexes allowed 


e Use cases: Session Store, Caching Data 
e Managed Services: Azure Cosmos DB Table API, Azure Table Storage 


Semi Structured Data - 3 - Graph "28 


Social media applications have data with complex relationships 


How do you store such data? 
= As a graph in Graph Databases 


= Used to store data with complex relationships 
Contains nodes and edges (relationships) 
Use cases: People and relationships, Organizational charts, Fraud Detection 
Managed Service: Azure Cosmos DB Gremlin API 


Semi Structured Data - 4 - Column Family "28 


Rowid Column Family 1 Column Family 2 Column Family 3 
coli col2 col3 col1 col2 col3 coli col2 col3 
1 


2 
3 


e Data organized into rows and columns 
e Can appear similar to a relational database 
e IMPORTANT FEATURE: Columns are divided into groups called column-family 


= Rows can be sparse (does NOT need to have value for every column) 


e Use cases: IOT streams and real time analytics, financial data - transaction 
histories, stock prices etc 


Managed Service: Azure Cosmos DB Cassandra API 


Unstructured Data 28 


E Virtual Server A y 
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EO Virtual Server B E 


File Storage Host Computer Block Storage 


e Data which does not have any structure (Audio files, Video files, Binary files) 
= What is the type of storage of your hard disk? 
o Block Storage (Azure Managed Service: Azure Disks) 
= You've created a file share to share a set of files with your colleagues in a enterprise. What 
type of storage are you using? 
o File Storage (Azure Managed Service: Azure Files) 
= You want to be able to upload/download objects using a REST API without mounting them 


onto your VM. What type of storage are you using? 
o Object Storage (Azure Managed Service: Azure Blob Storage) 


Relational vs Non Relational Data - Quick Overview "28 
e Relational Data (Structured Data) + 

= OLTP: SQL Server on Azure VMs, Azure SQL Database (or Azure SQL Managed b d 

Instance), Azure Database for PostgreSQL, MariaDB, MySQL + 


= OLAP: Azure Synapse Analytics Cosmos DB 
e Non Relational Data (Semi Structured/Unstructured Data) == 
= Semi Structured - Document (JSON) soL 
o Azure Cosmos DB SQL API and Cosmos DB MongoDB API 
= Semi Structured - Key-Value SQL Database 
o Azure Cosmos DB Table API, Azure Table Storage = 


Azure Database MySQL 


= Semi Structured - Column-Family 
o Azure Cosmos DB Cassandra API 


= Semi Structured - Graph 
o Azure Cosmos DB Gremlin API = 
= Unstructured Data 


o Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage) Azure Storage 


Databases - Scenarios 128 
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Scenario Solution 
A start up with quickly evolving schema for storing Azure Cosmos DB SQL API and Cosmos DB 
documents MongoDB API 
Transactional local database processing thousands of Azure SQL Database and other relational 
transactions per second databases.. 


fraud 


Database for analytics processing of petabytes of structured Azure Synapse Analytics 
data 
File share between multiple VMs Azure Files 
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Relational Databases 128 
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Structured Data - Tables, Rows and Columns 
Structured Query Language (SQL) for retrieving and managing data 


e Recommended when strong transactional consistency guarantees 
are needed 


SQL Database 


— 
e Database schema is mandatory Mu 
e Azure Managed Services: TT 
= Azure SQL Database P 
= Azure SQL Managed Instance 
= Azure Database for PostgreSQL Azure Database 


PostgreSQL 


= Azure Database for MySQL 
= Azure Database for MariaDB 


Azure SQL Database "28 


e Fully Managed Service for Microsoft SQL Server 
99.99% availability 
Built-in high availability, automatic updates and backups 
e Flexible and responsive serverless compute 
e Hyperscale (up to 100 TB) storage SQL Database 
Transparent data encryption(TDE) - Data is automatically 
encrypted at rest 
e Authentication: SQL Server authentication or Active Directory 
(and MFA) 


Relational Databases - Tables and Relationships "28 
e Relational Databases are modeled using 
Tables and Relationships === Pert 
= A Course has an Instructor o DO 


LastName : varchar(255) | iii . 
Phone : varchar(255) i 


= A Course belongs to a Department 
e Table: Table contains columns and rows : 
= All rows in a table have same set of columns o 
= Relationship between tables is established using 
Primary Key and Foreign Key 


o Primary Key: Uniquely identifies a row in a table 
o Foreign Key: Provides a link between data in two tables 


lame : varchar(255) 
InstructoriD : int 
Duration : int 

Name : varchar(255) 


FirstName : varchar(255) 
LastName : varchar(255) 
Phone : varchar(255) 


Structured Query Language 


e SQL: Language used to perform operations on relational databases 


Data Definition Language (DDL): Create and modify structure of database objects 
o Create: Create a database or its constituent objects (Table, View, Index etc) 


o Drop: Delete objects (Table, View, Index) from database 
o Alter: Alter structure of the database 


Data Query Language (DQL): Perform queries on the data 
o Example: SELECT * from Course, SELECT Count(*) from Course 


Data Manipulation Language (DML): Insert, update or delete data 


o Example: insert into Course values (1, 'AZ-900', 1); 
o Example: Update Course Set title='AZ-900 Azure Fundamentals' where id=1 
o Example: Delete from Course where id=1 


Data Control Language (DCL): Manage permissions and other controls 
o Example: Grant and revoke user access - GRANT SELECT ON course TO userl 


Transaction Control Language(TCL): Control transactions within a database 
o Commit - commits a transaction 


o Rollback -rollbacks a transaction (used in case of an error) 


SQL Database 


Index 128 
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CREATE CLUSTERED INDEX INDEX NAME on TABLE (COLUMN NAME); 


CREATE NONCLUSTERED INDEX INDEX NAME on TABLE (COLUMN NAME); 


Allows efficient data retrieval from a database 

Combination of one or more columns 

Remember: An index is automatically created with the primary key 
Remember: A table can have more than one index 


Two Types of Indexes: 


= Clustered: Data in table is stored in the order of the index key values 
o Remember: Only one clustered index per table ( Why? - data rows can only be sorted in one way) 


= Non-clustered indexes: Index stored separately with pointers to the data rows 


View 28 
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create view all courses with students 
as 


select course id, student id, first name, last name, title 
from Course Student, Student, Course 
where Course Student.student id = Student.id and 

Course Student.course id=Course.id; 


e View: Virtual table mapped to a query 
e Can be used just like a table in SQL queries 


e Use cases: Add calculated columns, join multiple tables, filter unnecessary 
columns 


Normalization 128 
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e Goals in designing relational databases: 


= High Data Integrity Ei 
= Minimum Data Redundancy (or Duplication) 
e How do achieve these goals? e 


= Database Normalization: "Process of restructuring a relational database to reduce 
data redundancy and improve data integrity" 


o First Normal Form (1NF): Single(atomic) valued columns 
o Violation Example: A column named address 


o Second Normal Form (2NF): Eliminate redundant data 


o Third Normal Form (3NF): Move columns not directly dependent on primary key 
o (REMEMBER) There are other normal forms (4NF, 5NF, ...) but 3NF is considered good enough for most relational data 


e Advantages of Normalization 
= Avoid same data being duplicated in multiple tables 


= Reduce disk space wastage 
= Avoid data inconsistencies 


Normalization example "28 
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Unnormalized - Enrollment Details 


student_first_name student_last_name title instructor_first_name instructor_last_name 


1 Ranga K AZ-900 in28minutes cloud 
2 Ranga K DP-900 in28minutes cloud 
3 Sathish M AZ-900 in28minutes cloud 
4 Sathish M DP-900 in28minutes cloud 
5 Ramesh $ AZ-900 in28minutes cloud 
6 Ramesh S Google Cloud in28minutes cloud 


Normalized - Student 


id first_name last_name 


1 Ranga K 
2 Sathish M 
3 Ramesh S 


Normalized - Instructor 


id first_name last_name 


1 in28minutes cloud 


Normalization example - 2 


Normalized - Course 


Normalized - Course Student 


title 

AZ-900 
DP-900 
Google Cloud 
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instructor_id 
1 
1 
1 


id course id student id 


1 


o vw ek uU N ma 


2 
1 
2 
1 
3 


1 
1 
2 
2 
3 
3 


Transactions "28 
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e Transaction: Sequence of operations that need to be atomic 
= All operations are successful (commit) OR NONE are successful (rollback) E 
= Example: Transfer $10 from Account A to B 


o Operation 1: Reduce $10 from Account A 
o Operation 2: Add $10 to Account B 


o If Operation 1 is successful and Operation 2 fails - Inconsistent state 
o You don'twantthat! 


Database 


e Properties: ACID (Atomicity, Consistency, Isolation, Durability) 


= Atomicity: Each transaction is atomic (either succeeds completely, or fails 
completely) 


= Consistency: Database must be consistent before and after the transaction 
= Isolation: Multiple Transactions occur independently 


= Durability: Once a transaction is committed, it remains committed even if there are 
system failures (a power outage, for example) 


e Remember: Supported in all Relational Databases 


Azure SQL Database - Purchase Models 


e vCore-based: Choose between provisioned or serverless compute 
= OPTIONAL: Hyperscale (Autoscale storage) 


= Higher compute, memory, I/O, and storage limits 
= Supports BYOL 


= Serverless Compute: Database is paused during inactive periods 
o You are only billed for storage during inactive periods 


o If there is any activity, database is automatically resumed 
e DTU-based: Bundled compute and storage packages 
= Balanced allocation of CPU, memory and IO 
= Assign DTUs (relative - Double DTU => Double resources) 


= Recommended when you want to keep things simple 
o You CANNOT scale CPU, memory and IO independently 


= Use DTUs for small and medium databases (< few hundred DTUs) 
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SQL Database 
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Azure SQL Database - Important Features 128 
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Feature Description 


Single database Great fit for modern, cloud-born applications 
Fully managed database with predictable performance 
Hyperscale storage (up to 100TB) 
Serverless compute 


Elastic pool Cost-effective solution for multiple databases with variable usage patterns 
Manage multiple databases within a fixed budget 


Database server Database servers are used to manage groups of single databases and elastic pools. 
Things configured at Database server level: Access management, Backup management 


Azure SQL Database - Remember 


e Prerequisites to connect and query from Azure SQL 


database: 
= 1: Connection Security: Database should allow connection from 
your IP address 


= 2: User should be created in the database 
= 3: User should have grants (permissions) to perform queries - 
Select, Insert etc. 
e Use BYOL to reduce license costs 


e Use read-only replicas (Read scale-out) for offloading 
read-only query workloads 


1:28 
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Reporting 
Application 


Application 


Asynchornous 
Replication 
Database 


Azure SQL managed instance 28 
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e Another Fully Managed Service for Microsoft SQL Server 
e What's New: Near 100% SQL Server feature compatibility 
e Recommended when migrating on premise SQL Servers to Azure 


e Azure SQL managed instance features NOT in Azure SQL Database 
= Cross-database queries (and transactions) within a single SQL Server instance 
= Database Mail 


= Built in SQL Server Agent 
o Service to execute scheduled administrative tasks - jobs in SQL Server 
= Native virtual network support 


e Supports only vCore-based purchasing model 


e (Remember) SQL Server Analysis Services (SSAS), SQL Server 


Reporting Services (SSRS), Polybase: NOT supported by both Azure 
SQL Database and SQL Managed Instance 


Ed 


SQL Server in Azure - Summary 28 
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Service Description 
SQL Server on Azure Provides full administrative control over the SQL Server instance and underlying 
Virtual Machines OS for migration to Azure 
Azure SQL Database Fully Managed Service for Microsoft SQL Server. 
Recommended for cloud-born applications 
Azure SQL managed Full (Near 100%) SQL Server access and feature compatibility 
instance Recommended for migrating on-premise SQL Server databases 


Azure SQL managed instance ONLY features: Cross-database queries, Database 
Mail Support, SQL Server Agent etc. 


Azure database for MySQL "28 


e Fully managed, scalable MySQL database 
e Supports 5.6, 5.7 and 8.0 community editions of MySQL 
e 99.99% availability 


= Choose single zone or zone redundant high availability Azure Database MySQL 
e Automatic updates and backups 


e Alternative: Azure Database for MariaDB 
= MariaDB: community-developed, commercially supported fork of MySQL 


Azure Database for PostgreSQL 


e Fully managed, intelligent and scalable PostgreSQL 
e 99.99% availability 
= Choose single zone or zone redundant high availability 
e Automatic updates and backups 
e Single Server and Hyperscale Options 


= Hyperscale: Scale to hundreds of nodes and execute queries across 
multiple nodes 


Azure Database 
PostgreSQL 


Relational Data - Scenarios 28 
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Scenario Solution 
You are migrating a Microsoft SQL Server database to cloud. You want full access to SQL Server on VM 
OS and Microsoft SQL Server installation. 
You are migrating a Microsoft SQL Server database to cloud. You do NOT need full Azure SQL Managed 
access to OS and Microsoft SQL Server installation. However, you need access to Instance 
Database Mail and SQL Server Agent. 
You want create a new managed Microsoft SQL Server database in cloud Azure SQL 
Database, Azure 
SQL Managed 
Instance 
Which category of SQL is this? GRANT SELECT ON course TO userl Data Control 


Language (DCL) 


Relational Data - Scenarios - 2 28 
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Scenario Solution 
Which category of SQL is this? create table course Data Definition Language (DDL) 
bens) 


Your queries on a relational databases are slow. What is the first Check if there is an index 

thing that you would consider doing? 

Your colleague asked you to normalize your tables. What should High Data Integrity & Minimum Data 
be your goals? Redundancy (or Duplication) 


How can you offload read-only workloads from Azure SQL Read-only replicas (Read scale-out) 
database? 


inutes 


Azure Cosmos DB 


Relational vs Non Relational Data - Quick Overview "28 
e Relational Data (Structured Data) + 
= OLTP: Azure SQL Database, Azure SQL Managed Instance, SQL Server on Azure VMs, Y 
Azure Database for PostgreSQL, MariaDB, MySQL + 


= OLAP: Azure Synapse Analytics Cosmos DB 
e Non Relational Data (Semi Structured/Unstructured Data) == 
= Semi Structured - Document (JSON) soL 
o Azure Cosmos DB SQL API and Cosmos DB MongoDB API 
= Semi Structured - Key-Value SQL Database 
o Azure Cosmos DB Table API, Azure Table Storage = 


Azure Database MySQL 


= Semi Structured - Column-Family 
o Azure Cosmos DB Cassandra API 


= Semi Structured - Graph 
o Azure Cosmos DB Gremlin API = 
= Unstructured Data 


o Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage) Azure Storage 


Azure Cosmos DB "28 
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e Fully managed NoSQL database service 


+ 
e Global database: Automatically replicates data across multiple Azure Y 
regions há 


= Single-digit millisecond response times iS 
= 99.999% availability 

= Automatic scaling (serverless) - Storage and Compute 

= Multi-region writes 


= Data distribution to any Azure region with the click of a button 
o Your app doesn't need to be paused or redeployed to add or remove a region 
e Structure: Azure Cosmos account(s) > database(s) > container(s) > 
item(s) 


Azure Cosmos DB APls 128 


e Core(SQL): SQL based API for working with documents 
e MongoDB: Document with MongoDB API 


= Move existing MongoDB workloads 
e Table: Key Value ds 
= Ideal for moving existing Azure Table storage workloads 
e Gremlin: Graph 
= Store complex relationships between data 
e Cassandra: Column Family 


e REMEMBER: You need a separate Cosmos DB account for each type of 
API 


Azure Cosmos DB - What is Different? KAU 


e Single-digit millisecond response times even if you 
scale to petabytes of data with millions of TPS a X AORN 
= Horizontal scalability 7 | 


e One thing | love about Azure Cosmos DB: Flexibility 


= Structure data the way your application needs it 
o Let the structure evolve with time 


= Provides a variety of consistency levels 
o Strong, Bounded staleness, Session, Consistent prefix, Eventual 


= If you are familiar with SQL but want to still use document 
database use SQL API 
= Options for key-value, column-family and graph databases 


Cosmos DB - Structure 28 
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Entity SQL Cassandra MongoDB Gremlin Table 
E GETE 
E o... al 
= E CR RO a RO ds 


Cosmos DB - Logical and Physical Partitions 


Each container is horizontally 

partitioned in an Azure region 

= ALSO distributed to all Azure 
regions associated with the 
Cosmos DB account 

Items in a container divided 

into logical partitions based 

on the partition key 


Cosmos DB take care of 
categorizing logical partitions 
into physical partitions 
Ensures high availability and 
durability 


(https://docs.microsoft. com) 
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“city” : "London | “city” :"NYC" | taty :"Paris" || “city” :"Rome" | “airport” :"AMS" | “airport”: "SEA" | | “airport”: "LAX" | 
pei pei i JE J E E E pz = E EZ 
— Err EE EB == — == Logical 
= “ee == == HH EO == Partitions 
EE ' .. E EE ESE 
EO E == 
: : = Sa 
E H == a BE eee 
` ' E EE EE 
Oz d O m ZO ra Physical 
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Cosmos DB - Provisioned throughput vs Serverless "28 
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Factor Provisioned throughput Serverless 

Description Provision throughput in Request Units per No need to provision capacity. Auto scales to 
second meet request load. 

What are you RUs provisioned per hour (usage does per-hour RUs consumed + Storage 

billed for? NOT matter) + Storage 

When to use? Continuous predictable traffic Intermittent, unpredictable traffic 

Multi Regions Yes No - only in 1 Azure region 

Max storage per No limit 50 GB 

container 

Performance < 10 ms latency for point-reads and writes < 10 ms latency for point-reads and < 30 ms 

for writes 


Azure Cosmos DB - Scenarios 1228 
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Scenario Solution 

How can you increase storage associated with Azure Automatic scaling (serverless) 

Cosmos DB? 

What is the high level structure of storing data in Azure Azure Cosmos account(s) > database(s) > 
Cosmos DB? container(s) > item(s) 

How are items in a container divided into logical Using partition key 

partitions? 

You want to store data for a social networking app with Gremlin API 


complex relationships 
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Relational vs Non Relational Data - Quick Overview "28 
e Relational Data (Structured Data) + 
= OLTP: Azure SQL Database, Azure SQL Managed Instance, SQL Server on Azure VMs, Y 
Azure Database for PostgreSQL, MariaDB, MySQL + 


= OLAP: Azure Synapse Analytics Cosmos DB 
e Non Relational Data (Semi Structured/Unstructured Data) == 
= Semi Structured - Document (JSON) soL 
o Azure Cosmos DB SQL API and Cosmos DB MongoDB API 
= Semi Structured - Key-Value SQL Database 
o Azure Cosmos DB Table API, Azure Table Storage = 


Azure Database MySQL 


= Semi Structured - Column-Family 
o Azure Cosmos DB Cassandra API 


= Semi Structured - Graph 
o Azure Cosmos DB Gremlin API = 
= Unstructured Data 


o Block Storage (Azure Disks), File Storage (Azure Files), Object Storage (Azure Blob Storage) Azure Storage 


Azure Storage 
e Managed Cloud Storage Solution 


= Highly available, durable and massively scalable (upto few PetaBytes) 
e Core Storage Services: 

= Azure Disks: Block storage (hard disks) for Azure VMs 

= Azure Files: File shares for cloud and on-premises 

= Azure Blobs: Object store for text and binary data 

= Azure Queues: Decouple applications using messaging 


= Azure Tables: NoSQL store (Very Basic) 
o Prefer Azure Cosmos DB for NoSQL 


e (PRE-REQUISITE) Storage Account is needed for Azure Files, 
Azure Blobs, Azure Queues and Azure Tables 
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Azure Storage 


Azure Storage - Data Redundancy 28 
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Option Redundancy Discussion 
Locally redundant Three synchronous copies in same data center Least expensive and least 
storage (LRS) availability 
Zone-redundant storage Three synchronous copies in three AZs in the primary 

(ZRS) region 

Geo-redundant storage LRS + Asynchronous copy to secondary region (three 

(GRS) more copies using LRS) 

Geo-zone-redundant ZRS + Asynchronous copy to secondary region (three Most expensive and 
storage (GZRS) more copies using LRS) highest availability 


Block Storage 


e Use case: Hard-disks attached to your 
computers 


e Typically, ONE Block Storage device can be 
connected to ONE virtual server 

e HOWEVER, you can connect multiple different 
block storage devices to one virtual server 


Host Computer 


Block Storage 


Azure Disks Storage "28 
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e Disk storage: Disks for Azure VMs 
= Types: 
o Standard HDD: Recommended for Backup, non-critical, infrequent access 


o Standard SSD: Recommended for Web servers, lightly used enterprise applications and 
dev/test environments 


o Premium SSD disks: Recommended for production and performance sensitive workloads 


o Ultra disks (SSD): Recommended for IO-intensive workloads such as SAP HANA, top tier Azure Storage 
databases (for example, SQL, Oracle), and other transaction-heavy workloads 


= Premium and Ultra provide very high availability 


e Managed vs Unmanaged Disks: 


= Managed Disks are easy to use: 
o Azure handles storage 


o High fault tolerance and availability 
= Unmanaged Disks are old and tricky (Avoid them if you can) 
o You need to manage storage and storage account 
o Disks stored in Containers (NOT Docker containers. Completely unrelated.) 


Azure Files 


e Media workflows need huge shared 
storage for things like video editing 


e Enterprise users need a quick way to 
share files in a secure & organized way 
e Azure Files: 
= Managed File Shares 


= Connect from multiple devices concurrently: 
o From cloud or on-premises 
o From different OS: Windows, Linux, and macOS 
= Supports Server Message Block (SMB) and 
Network File System (NFS) protocols 
= Usecase: Shared files between multiple VMs 
(example: configuration files) 
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Virtual Server A 


Virtual Server B 


File Storage Host Computer 


Azure Blob Storage "28 


Minutes 


e Azure Blob Storage: Object storage in Azure 
e Structure: Storage Account > Container(s) > Blob(s) 


e Store massive volumes of unstructured data 


= Store all file types - text, binary, backup & archives: 
o Media files and archives, Application packages and logs 


o Backups of your databases or storage devices AZE Soage 

e Three Types of Blobs 

= Block Blobs: Store text or binary files (videos, archives etc) 

= Append Blobs: Store log files (Ideal for append operations) 

= Page Blobs: Foundation for Azure laaS Disks (512-byte pages up to 8 TB) 
e Azure Data Lake Storage Gen2: Azure Blob Storage Enhanced 

= Designed for enterprise big data analytics (exabytes, hierarchical) 

= Low-cost, tiered storage, with high availability/disaster recovery 


Azure Blob Storage - Access Tiers 


e Different kinds of data can be stored in Blob Storage 
= Media files, website static content 
= Backups of your databases or storage devices 
= Long term archives 


e Huge variations in access patterns 


e Can I pay a cheaper price for objects | access less frequently? 
= Access tiers 


o Hot: Store frequently accessed data 
o Cool: Infrequently accessed data stored for min. 30 days 


o Archive: Rarely accessed data stored for min. 180 days 
o Lowest storage cost BUT Highest access cost 


o Access latency: In hours 


o To access: Rehydrate (Change access tier to hot or cool) OR 
o Copy to another blob with access tier hot or cool 


o You can change access tiers of an object at any point in time 
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Azure Storage 


Azure Storage - Remember 


e Azure Queues: Decouple applications using messaging 
e Azure Tables: NoSQL store (Very Basic) 

= A key/value store 

= Store and retrieve values by key 

= Supports simple query, insert, and delete operations 


= Cosmos DB Table API is recommended as key/value store for newer 
usecases (supports multi-master in multiple regions) 


= Azure Tables only supports read replicas in other regions 


o GRS or GZRS: Data in secondary region is generally NOT available for read or write access 
o Available for read or write only in case of failover to the secondary region 


o To enable round the clock read access: 
o Use read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS) 
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Azure Storage - Scenarios 128 


Minutes 


Scenario Solution 


What is needed before storing data to Azure Files, Azure Blobs, Azure Queues and Azure Storage Account 
Tables? 


You have a Storage Account and you are making use of Azure Blob Storage. You want to No 

create a new file share. Is it mandatory to create a new Storage Account? 

You want highest availability for data in your Storage Account Geo-zone- 
redundant 
storage (GZRS) 

Which service supports Server Message Block (SMB) and Network File System (NFS) Azure Files 

protocols? 

You are not planning to access your data in Azure Blob storage for a few years. You can Move data to 


wait for a few hours when you need to access the data. How can you reduce your costs? Archive tier 
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Data Analytics “28 
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e Goal: Convert raw data to intelligence 
= Uncover trends and discover meaningful information 
= Find new opportunities and identify weaknesses 
= Increase efficiency and improve customer satisfaction 
= Make appropriate business decisions 
e Raw data can be from different sources: 
= Customer purchases, bank transactions, stock prices, weather 
data, monitoring devices etc 
e Approach: Ingest => Process => Store (data warehouse 
or a data lake) => Analyze 


Ex: Decide future sales using past customer behavior 
Ex: Faster diagnosis & treatment using patient history 


Data Analytics Work Flow 128 
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Data Ingestion: Capture raw data 


= From various sources (stream or batch) 
o Example: Weather data, sales records, user actions - websites .. 


Data Processing: Process data 


= Raw data is not suitable for querying 
o Clean (remove duplicates), filter (remove anomalies) and/or aggregate data 


o Transform data to required format (Transformation) 


Data Storage: Store to data warehouse or data lake 
Data Querying: Run queries to analyze data 


Data Visualization: Create visualizations to make it 


easier to understand data and make better decisions 
= Create dashboards, charts and reports (capture trends) 


= Help business spot trends, outliers, and hidden patterns in data 


Data Analysis Categories "28 
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e Descriptive analytics: What's happening? 

= Based on historical/current data 

= Monitor status (of KPIs) and generate alerts 

= Example: Generating reports (current vs planned) 
e Diagnostic analytics: Why is something happening? 

= Take findings from descriptive analytics and dig deeper 

= Example: Why did sales increase last month? 

= Example: Why are sales low in Netherlands? 
e Predictive analytics: What will happen? 

= Predict probability based on historical data 

= Mitigate risk and identify opportunities 

= Example: What will be the future demand? 

= Example: Calculate probability of something happening in future 


1.25E+9 


1.00E+9 


7.50E+8 


5.00E+8 


Data Analysis Categories - 2 


e Prescriptive analytics: What actions should we take? 
= Use insights from predictive analytics and make data-driven 
informed decisions 
= Still in early stages 
= Example: What can | do to increase probability of this course 
being successful in future? 


e Cognitive analytics: Make analytic tools to think like 


humans 

= Combine traditional analytics techniques with Al and ML 
features 

= Examples: Speech to text (transcription or subtitles), text to 
speech, Video Analysis, Image Analysis, Semantic Analysis of 
Text (Analyze reviews) 


1.25E+9 


1.00E+9 


7.50E+8 


5.00E+8 
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Big Data - Terminology and Evolution 28 
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e 3Vs of Big Data 
= Volume: Terabytes to Petabytes to Exabytes = ES q 


= Variety: Structured, Semi structured, Unstructured 
= Velocity: Batch, Streaming... 


e Terminology: Data warehouse vs Data lake 


= Data warehouse: PBs of Storage + Compute (Typically) 


o Data stored in a format ready for specific analysis! (processed data) 
o Examples: Teradata, BigQuery(GCP), Redshift(AWS), Azure Synapse Analytics 


o Typically uses specialized hardware 


= Data lake: Typically retains all raw data (compressed) 


o Typically object storage is used as data lake 
o Amazon S3, Google Cloud Storage, Azure Data Lake Storage Genz? etc.. 


o Flexibility while saving cost 
o Perform ad-hoc analysis on demand 


o Analytics & intelligence services (even data warehouses) can directly read from data lake 
o Azure Synapse Analytics, BigQuery(GCP) etc.. 


VARIETY 


Data warehouse Best Practice - De-normalized Star Schema 


CALE 
io SALES Fact fact table ProductKey 
ay - 


ProductiD 
ProductName 


E 


DollarsSold CUSTOMER 


CustomerZip 


e How do you structure data for quick analysis in a data warehouse? 
= Option: Star Schema 


= Modeling approach most widely used by relational data warehouses 


e Each Table classified as "Dimension" or "Fact": 


= Fact tables: Quantitative data - Data generated in a transactional system (typically) 
o Contains observations or events (sales orders, stock balances, temperatures ...) 


= Dimension tables: Contain descriptive attributes related to fact data 
o Example: Product, Customer, Store, Calendar 


e Advantage: Star schemas are de-normalized and are easier to query 


Data Analytics: 3 Azure Specific Services 128 
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e Azure Synapse Analytics: End-to-end analytics solutions 
= Data integration + Enterprise data warehouse + Data analytics 
= Create SQL and Spark pools to analyze data 
e Azure Data Factory: Fully managed serverless service to build complex data 
pipelines 
= Extract-transform-load (ETL), extract-load-transform (ELT) and data integration 


e Power Bl: Create visualization around data 
= Unify data and create Bl reports & dashboards 


Big Data - Hadoop, Spark and Databricks "28 
e Hadoop based approaches: 


= Apache Hadoop: Create datasets with variety of data. Get intelligence. 


4 
o Runs on commodity servers with attached storage (Large clusters - thousands of nodes) Oy 
o Hadoop Distributed File System (HDFS): Primary data storage 


o MapReduce: Write Java, Python, .. apps to process data unem OY 
o Enables massive parallelization SparK 
o HIVE: Query using SQL 
o Apache Spark: How about processing in-memory? 
o Really fast: Can be up to 100 times faster than MapReduce (if you make sufficient memory available) 
o Supports Java, Python, R, SQL and Scala programming languages 
o Run data analytics, data processing and machine learning workloads 
o Has become very popular and is offered as a separate service in most cloud platforms! 
= Databricks: Web-based platform for working with Spark 
o Centralized platform for machine learning, streaming analytics and business intelligence workloads 
o Founded by the creators of Apache Spark 
o Automated cluster management 


Hadoop and Spark in Azure 28 
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e Azure HDInsight: Managed Apache Hadoop Azure service 
= Process big data with Hadoop, Spark 


e Azure Databricks: Managed Apache Spark service 
= Premium Spark offering 


= Focused only on running Apache Spark workloads 
= Can consume data from Azure SQL Database, Event hubs, Cosmos DB 
e Other Azure Spark Integrations: 


= Azure Synapse Analytics: Can run Spark jobs using "Apache Spark for Azure 
Synapse" 


= Azure Data Factory: Run pipelines involving Azure services like Azure HDInsight, 
Azure Databricks 


Massive Parallel Processing (MPP) 28 
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e Split processing across multiple compute nodes 


e Typically separate storage and compute 
= Use Data lake as storage (for example) 
= Scale compute on demand 


e Examples: Spark, Azure Synapse Analytics 
= Some services run Spark in serverless mode! 


Batch Pipelines "28 
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Analytics 


Data 


an 
Sources 


reporting 


e Data Factory 


Orchestration + Oozie (HDinsight) 


(https://docs.microsoft. com) 


e Batch Processing: Buffering and processing data in groups 
= Define condition - how often to run? (every 6 hours or after 10K records) 


= Advantages: Process huge volumes of data during off-peak hours (overnight, for example) 
o Typically takes longer to run (minutes to hours to days) 
= Example: Read from storage (Azure Data Lake Store), process, and write to Relational 
Database or NoSQL Database or Data warehouse 


Streaming Pipelines 


pk Data Storage 


Data 
Sources 


Analytical 
data store 


Ingestion Stream Processing 
e Event Hubs e Stream Analytics 


e loT Hut e Storm 


e Kafka e Spark Streaming 


(https://docs.microsoft. com) 


e Streaming Processing: Real-time data processing 
= Processing data as it arrives (in seconds or milliseconds) 


Analytics 
and 
reporting 


28 
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= Examples: Stock Market Data, Telemetry from IOT Devices, User action metrics from websites 


Stream vs Batch Processing "28 


Minutes 


Feature Batch Streaming 

Time Process data in batches - all data from few Process most recent data (last 30 seconds, for 

Period hours to few days to few months example). 

Data Process large datasets efficiently Process individual records or micro batches 

Size containing a few records 

Latency High - Typically few hours Low - Typically few seconds or milliseconds 

Usecase Use for performing complex storage or Used for storing individual records, simple 
analysis aggregation or rolling average calculations 


Apache Parquet 


e Open source columnar storage format 


e High compression because of columnar 
storage 


e Efficient storage for big data workloads 
e Introduced by the Apache Hadoop ecosystem 
Supported by most big data platforms: 


= Azure Data Factory supports Parquet for both read and 
write (Source and Sink) 


= Azure Data Lake Storage / Azure Blob Storage - Store 
data in Parquet format 


= Azure Synapse Analytics can be used to store tabular 
representation of data in Parquet format 


Row Storage 


n28 
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11M 


Columnar Storage 


ETL "28 
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ETL (Extract, Transform, and Load): Retrieve data, process and store it 
Data can be from multiple sources 
e Recommended for simple processing: 

= Basic data cleaning tasks, de-duplicating data, formatting data 

= Example: Ensure data privacy and compliance 

o Removing sensitive data before it reaches analytical data models 

e Can run each of the phases in parallel 
= While extract is going on, you can transform data which is already loaded 


ELT "28 
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e ELT (Extract, Load, and Transform): Data is stored before it is transformed 
e Uses an iterative approach (multiple steps) to process data in target system 


e Needs a powerful target datastore: 
= Target datastore should be able to perform transformations 


e Advantage: Does NOT use a separate transformation engine 


e Typical target data stores: Hadoop cluster (using Hive or Spark), Azure 
Synapse Analytics 


= Enables use of massively parallel processing (MPP) capabilities of data stores 


Azure Synapse Analytics 


On-premises 
data 


Streaming 
data < 


Cloud data a 


SaaS data Ga 


> 


© Azure Synapse Analytics 


Studio 


Data Integration Management Monitoring Security 


Analytics runtimes 


u SQL E Apache Spark 


= Azure Data Lake Storage Gen2 


(https://docs.microsoft. com) 


e Develop end-to-end analytics solutions 


= Data integration + Enterprise data warehouse + Data analytics 


= SQL technologies + Spark technologies + Pipelines 
= Full integration with Power Bl, Cosmos DB, and Azure ML 


> Azure Purview 


Azure 
Machine Learning 


(ul) Power Bl 
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Azure Synapse Analytics - Workflow 


e In a workspace, create pipelines for: 
= Data Ingestion: 
o Ingest data from 90+ data sources (Cosmos DB,AWS, GCP..) 
o Stream data into SQL tables 


= Data Storage: Datasets - Azure Storage, Azure Data 
Lake Storage 


o Formats: Parquet, CSV, JSON .. 


= Data Processing: Mix & match SQL and Spark 

o SQL pool: SQL Database supporting distributed T-SQL queries 
o Two consumption models: dedicated and serverless 
o Recommended for complex reporting & data ingestion using Polybase 
o SQL Pool can be paused to reduce compute costs 

o Apache Spark pools: Run Spark based workloads 
o 1: Create Spark data analysis notebooks OR 
o 2: Run batch Spark jobs (jar files) 
o Recommended for data preparation and ML 
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Azure Data Factory "28 
e Fully managed serverless service to build complex data pipelines: 


= Extract-transform-load (ETL), extract-load-transform (ELT) and data integration 
o 90 built-in connectors 


o Ingest data from: 
o Big Data sources like Amazon Redshift, Google BigQuery Data Factory 


o Enterprise data warehouses like Oracle Exadata, Teradata 
o All Azure data services 
= Build data flows to transform data 


o Integrate with services like Azure HDInsight, Azure Databricks, Azure Synapse Analytics for data 
processing 


= Move SQL Server Integration Services (SSIS) packages to cloud 


e CI/CD support with Azure Devops 


Demo - Azure Data Factory and Synapse Analytics "28 


e Create a Data Lake Storage Account Gen2 
e Create a SQL Server Database 
e Task: Extract data from SQL Server to CSV file 


Azure Data Lake Storage (Gen2) 


e Blob storage + Hierarchical directory structure 
e Configure permissions(RBAC) at file and directory level 
e Fully compatible with Hadoop Distributed File System (HDFS) 


= Apache Hadoop workloads can directly access data in Azure Data Lake 
Storage 


e Three main elements: 
= Data Lake Store: Azure Data Factory, Azure Databricks, Azure HDInsight, 
Azure Data Lake Analytics, and Azure Stream Analytics can read directly 
= Data Lake Analytics: Run analytics jobs using U-SQL 
= HDInsight: Run Hadoop jobs 
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Data Lake Storage 


Azure Data Factory - Components 28 
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e Pipeline: Logical group of activities that can be scheduled o 
= You can chain activities in a pipeline 
= You can run activities sequentially or in parallel Es 
= À pipeline can execute other pipelines 


e Activity: Represents a step in a pipeline (an action to be performed) 
= Copy Activity: Copy data from one store to another store 
o Example: Copy CSV from Blob Storage to a Table in SQL Database 


= Three types of activities: Data movement, Data transformation, Control activities 
e Data Flow: Create and manage data transformation logic 
= Build reusable library of data transformation routines 
= Executes logic on a Spark cluster: 
o You don't need to manage the cluster (it is spun up and down automatically as needed) 
e Control flow: Orchestrate pipeline activity based on output of 
another pipeline activity 


Data Factory 


Azure Data Factory - Components - 2 28 


e Linked Service: Used to connect to an external source 
= Connect to different sources like Azure Storage Blob, SQL Databases etc 


e Dataset: Representation of data structures within data stores 


e Integration Runtime: Compute infrastructure used by Azure Data TE 
Factory allowing you to perform 


e Triggers: Trigger pipeline at a specific times 


Power Bl 128 


Minutes 


e Power Bl: Unify data and create Bl reports € dashboards 
= Integrates with all Azure analytics services b 


o Azure Synapse Analytics to Azure Data Lake Storage 


= Power Bl Components 


o Power BI Service: Online SaaS (Software as a Service) service 
o Power Bl online - app.powerbi.com 


o Create/share reports and dashboards 


o Power Bl Desktop: Windows desktop application to create and share reports 
o More data sources, Complex modeling and transformations 


o Power BI Report Builder: Standalone tool to author paginated reports 
o Power BI Mobile Apps: Apps for Windows, ¡OS, and Android devices 
= Typical Power BI Workflow: 
o 1: Create a report with Power BI Service/Desktop (or paginated report with Power BI Report Builder) 
o 2: Share it to the Power BI service 


o 3: View and interact with report (and create dashboards) using Power BI service 
o Reports can also be accessed from Power BI mobile 


Power Bl 


TUO 


Power Bl Dashboard 28 


Minutes 
e Workspace: Container for dashboards, reports, workbooks & datasets 
= Dataset: Collection of data 
o Can be a file(Excel, CSV etc) or a database "nij 
o Azure SQL Database, Azure Synapse Analytics, Azure HDInsight, .. 
o Each dataset can be used in multiple reports Power BI 


= Report: One or more pages of visualizations 
o Highly interactive and highly customizable 
o All data for a report comes from a single dataset 
o Areport can be used in multiple dashboards 
= Paginated Reports: Create pixel perfect multi page reports for printing & archiving 
(PDF/Word) 


o Create in "Power BI Report Builder" and publish to use in Power Bl service 


= Dashboard: Single page - visualizations from one or more reports 
o Technically a canvas with multiple tiles 


o Monitor the most important information at one glance and dig deeper, if needed 
o You can select a tile and go to a report page to dig deeper 


Visualization Options 


Bar and column charts: Most basic of charts 

Line Charts: Emphasize shape of a series of values over time 
Pie Charts: Displays division of total into different categories 
Matrix: Summarize data in a tabular structure 

Treemap: Charts of colored rectangles 


Scatter: Shows relationship between two numerical values 
= Bubble chart: Replace data points with bubbles 


o Bubble size represents a 3rd dimension 
Filled map: Show on a Map 
Cards: Single number 
Link: Reference for all visualization options in Power Bl 
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Data Analytics Work Flow - Data Ingestion "28 
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e Data Ingestion: Capture raw data 
= Azure Data Factory: data ingestion and transformation service 
o Ingest streaming and batch data 
o Data from on-premises and cloud 


o PolyBase: Run T-SQL queries on external data sources 
o PolyBase makes external data sources appear like tables 


o SQL Server Integration Services (SSIS): on-premises tool data integration 
and data transformation solution that is part of Microsoft SQL Server 
o Run existing SSIS packages as part of Azure Data Factory pipeline 


= Spark: Ingest streaming data 
= IOT Hub: Managed message hub for loT devices 


= Event Hub: Big data streaming platform and event ingestion 
service 


Data Analytics Work Flow - Data Processing and Storage :28 


e Data Processing and Storage: 
= Azure Data Lake Storage Gen2: Data lake storage 
= Azure Synapse Analytics: Data processing can be done using: 


o 1: T-SQL - Query using SQL from databases, files, and Azure Data Lake 
storage 


o 2: Spark - Write and run Spark jobs using Cf, Scala, Python, SQL etc 
= Azure Databricks: Process data from Azure Blob storage, Azure 
Data Lake Store, Hadoop storage, flat files, databases, and data 
warehouses 
o Handle streaming data 


= Azure HDInsight: Storage - Azure Data Lake storage 
o Analyze data using Hadoop Map/Reduce, Apache Spark, Apache Hive (SQL) 


= Azure Data Factory: Build pipelines and data-driven workflows 
o Ingest data from relational and non-relational systems 


Data Analytics Work Flow - Querying and Visualization  :28 
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e Data Querying: Run queries to analyze data 
= Recommended Services: Azure Synapse Analytics, Hive (SQL) 
e Data Visualization: Create dashboards, charts and 


reports 
= Recommended Services: Power BI 


Data Analytics - Scenarios 128 
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Scenario Solution 
ma E dC ee ce a 
vs planned analytics 
= o Me CI TR 
E Decide Data Analysis Category: Semantic Analysis of Text (Analyze reviews) Cognitiveanalytics 
: , ETETE 
Se EEE 
E ou want to move SQL Server Integration Services (SSS) packages tocloud Use AzwreData 
actory 
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Scenario Solution 
= E ere re 
pipeline activity 
= Mc CR 
E Compute infrastructure used by Azure Data Factory IntegrationRuntime _ 
=" < e eN 
Builder 
Ee GEO 
ETT a 
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Other Important Azure 
Concepts 


Database Tools "28 


Minutes 


Tool Description 


Azure Data Studio Cross-platform (Windows, Mac, linux) db tool with Intellisense, code snippets and 
source control 
Run SQL queries. Save results in different formats - text, JSON, Excel 
Supports SQL Server, Azure SQL Database, Azure Synapse Analytics.. 
Notebooks: Create and share documents with text, images and SQL query results 
Support to create and restore backup from SQL Database 


SQL Server Graphical tool for managing SQL Server and Azure Databases 
Management Studio Query, design, and manage your databases and data warehouses 
(SSMS) Supports configuration, management and administration tasks 

Suitable for SQL Server, SQL Database, Azure Synapse Analytics 
SQL Server Data Build SQL Server and Azure SQL relational databases, Analysis Services (AS) data 
Tools (SSDT) models, Integration Services (IS) packages, and Reporting Services (RS) reports 
sqlcmd Run SQL scripts and procedures from command line 


Supports SQL Server, Azure SQL Database, Azure SQL MI, Azure Synapse Analytics 
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Role Description 


Database Role: Install, upgrade, control (authorization, availability, durability, performance 
Administrator optimization, backups, disaster recovery, compliance with licensing) of data servers 
Tools: Azure Data Studio, SQL Server Management Studio (SSMS) 


Data Responsible for data architecture, data acquisition, data ingestion, data processing 
Engineers (transformation, cleansing and pipelines) and data storage (design, build and test) for 
analytical workloads 
Responsible for build, test, monitoring, performance optimization of data pipelines 
Responsible for improving data reliability, efficiency, and quality 
Tools: Azure Data Studio, Azure HDInsight, Azure Databricks, Azure Data Factory, Azure 
Cosmos DB, Azure Storage ... 
Programming Languages - HiveQL, R, or Python 
Data Analyst Responsible for getting intelligence from data through integration of data(from multiple 
sources), dashboards, reports, visualizations (charts, graphs, ..) and pattern identification 
(from huge volumes of data) 
Tools: Microsoft Excel, Power Bl... 


Pricing calculator 


e Estimate the costs for Azure services 
e Example Services that you can estimate costs for: 
= Virtual Machines 
= Storage Accounts 
= Azure SQL Database 
= Azure Cosmos DB 


e Ideal place to explore and learn important factors about 
different Azure services 
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Store Machine Management services (h ttps://docs. microso ft. com/) 


e Deployment and management service for Azure 


e All actions to any resource in Azure go through ARM 


= Irrespective of where you are performing it from 
o Azure portal OR Powershell OR CLI or ARM template or... 


Azure Resource Manager (ARM) templates "28 


e Lets consider an example: 
= | would want to create an Azure SQL Database 
= | would want to create an Azure Data Lake Storage Gen2 
= | would want to create an Azure Data Factory Workspace 
e AND | would want to create 4 environments 
= Dev, QA, Stage and Production! 
e Azure Resource Manager (ARM) templates can help you do all 
these with a simple (actually NOT so simple) script! 
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Details 


Web-based user interface. Great to get started BUT NO automation possible. 
Runs in all modern desktop and tablet browsers 


Execute cmdlets (sequence of commands) and create scripts (PowerShell script) 
Recommended for teams familiar with Windows administration 
Cross-platform (Windows, Linux, and macOS) 


Similar to Azure PowerShell BUT uses a different syntax (Bash Scripts) 
Recommended for teams familiar with Linux administration (and Bash Scripts) 
Cross-platform (Windows, Linux, and macOS) 


Free Browser based interactive shell (Access from Azure Portal) 

Common Azure tools pre-installed and configured to use with your account 
Supports both PowerShell and CLI (bash) 

Runs in all modern desktop and tablet browsers 


Azure Resource Hierarchy 


e Hierarchy: Management Group(s) > Subscription (s) > 
Resou rce Group (s) > Resources 
= Resources: VMs, Storage, Databases 
= Resource groups: Organize resources by grouping them into 
Resource groups 
= Subscriptions: Manage costs for resources provisioned for 
different teams or different projects or different business units 


= Management groups: Centralized management for access, 
policy, and compliance across multiple subscriptions 


e Remember: 
= No hierarchy in resource groups BUT management groups can 
have a hierarchy 
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Resource Groups 


e Resource Group: Logical container for resources 
= Associated with a single subscription 


= Can have multiple resources 
o (REMEMBER) A resource can be associated with one and only one resource group 


= Can have resources from multiple regions 
= Deleting it deletes all resources under it 
e Tags assigned to resource group are not automatically applied 


to resources 


= HOWEVER, Permissions/Roles assigned to user at the resource group level 
are inherited by all resources in the group 


e Resource Groups (like Management Groups) are free 
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e You need a Subscription to create resources in Azure 
= Subscription links Azure Account to its resources 


e An Azure Account can have multiple subscriptions and multiple 
account administrators 


e When do you create a new subscription? 


= | want to manage different access-management policies for different environments: 
o Create different subscriptions for different environments 


o Manage distinct Azure subscription policies for each environment 


= | want to manage costs across different departments of an organization: 
o Create different subscriptions for different departments 


o Create separate billing reports and invoices for each subscription (or department) and manage costs 


= |'m exceeding the limits available per subscription 
o Example: VMs per subscription - 25,000 per region 


? 


Subscriptions 


Management Groups 


e Allows you to manage access, policies, and 


compliance across multiple subscriptions 
= Group subscriptions into Management Groups 


= All subscriptions & resources under a Management Group inherit 
all constraints applied to it 


e (REMEMBER) You can create a hierarchy of 
management groups 


e (REMEMBER) All subscriptions in a management group 
should be associated with the same Azure AD tenant 
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Quick Review 
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Service Description 


Azure Queue Decouple applications using a queue (asynchronous communication) 

storage 

Azure Table Store structure data using NoSQL approach (NON-relational). Schemaless. Key/attribute 
storage store. 
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Service Description 
SQL Server on Azure Provides full administrative control over the SQL Server instance and underlying 
Virtual Machines OS for migration to Azure 
Azure SQL Database Fully Managed Service for Microsoft SQL Server. 

Recommended for cloud-born applications 
Azure SQL managed Full (Near 100%) SQL Server access and feature compatibility 
instance Recommended for migrating on-premise SQL Server databases 


Azure SQL managed instance ONLY features: Cross-database queries, Database 
Mail Support, SQL Server Agent etc. 


Azure Database for Fully managed MySQL database 
MySQL 
Azure Database for Fully managed PostgreSQL database 


PostgreSQL 
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Service Description 


Azure Cosmos DB NoSQL database. Globally distributed. 
Core(SQL), MongoDB, Table, Gremlin and Cassandra APIs 


Azure Analytics Services - Quick Review 28 


Service 


Minutes 


Description 


End-to-end analytics solutions 
Data integration + Enterprise data warehouse + Data analytics 
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Get Ready 


Certification Exam 


e Certification Home Page 
= https://docs.microsoft.com/en-gb/learn/certifications/exams/dp-900 


e Different Types of Multiple Choice Questions 
= Type 1: Single Answer - 2/3/4 options and 1 right answer 


= Type 2: Multiple Answer - 5 options and 2 right answers 


No penalty for wrong answers 
= Feel free to guess if you do not know the answer 


e 40-60 questions and 65 minutes 
e Result immediately shown after exam completion 
e Email with detailed scores (a couple of days later) 


Certification Exam - My Recommendations 


Read the entire question 
= Identify the key parts of the question 


Read all answers at least once 
If you do NOT know the answer, eliminate wrong answers first 


Mark questions for future consideration and review them before 
final submission 
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You are all set! 


Let's clap for you! e, 


e You have a lot of patience! Congratulations 


e You have put your best foot forward to get Microsoft Certification - «e 
DP-900: Microsoft Azure Data Fundamentals 

e Make sure you prepare well and 

e Good Luck! 


Do Not Forget! = 


e Recommend the course to your friends! 

= Do not forget to review! «dy 
e Your Success = My Success 

= Share your success story with me on LinkedIn (Ranga Karanam) o 


= Share your success story and lessons learnt in Q&A with other learners! 
Google Cloud 


aWS 
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https://github.com/in28minutes/learn 


e Learn Other Cloud Platforms: 
= Gartner predicts a multi cloud world soon 


= Get certified on AWS, Azure and Google Cloud 
e Learn DevOps (Containers and Container Orchestration) 
e Learn Full Stack Development 


